In real-world data analysis, your data will likely:
Fortunately, pandas can help you with all of this!
cols_to_keep = ['INSTNM', 'STABBR', 'GRAD_DEBT_MDN_SUPP']
debt = full[cols_to_keep]
debt.columns = ['name', 'state', 'debt']
debt.head().loc[] method (note the brackets)& (and) and | (or).isin() method: checks to see if value is in list of valuestx_debt = debt[(debt['debt'] != 'PrivacySuppressed') & (debt['state'] == 'TX')]
# Alternatively, use the .query() method
# tx_debt = debt.query('debt != "PrivacySuppressed" & state == "TX" ')
tx_debt.head()states = ['OK', 'NM', 'TX', 'LA']
sw_debt = debt[(debt['debt'] != 'PrivacySuppressed') & (debt['state'].isin(states))]
sw_debt.head().assign() method# Must use index-based labeling in this new column definition
df1['col4'] = df1['col1'] + df1['col2']
# With .assign()
df2 = df1.assign(col5 = df1.col3 / df1.col4)
df2.head()dtype conversion.astype() methodSettingWithCopyWarning.dropna() method: delete all rows (or columns) that have any missing values (NaN in pandas).fillna() method: fill in missing data with a specified valuepandas: .groupby() method!Process:
.groupby() in pandasseabornimport seaborn as sns
sns.set(style = "darkgrid", rc = {"figure.figsize": (8, 6)})
sns.boxplot(x = 'state', y = 'debtnum', data = sw2, orient = 'v')seabornseabornfor s in sw2.state.unique():
data = sw2[sw2.state == s]
sns.kdeplot(data.debtnum, shade = True, label = s).merge() method in pandaspandashow parameter): 'inner' (default), 'left', 'right', and 'outer'from pandas_datareader import wb
countries = ['ZA', 'BR', 'US']
urban = wb.download(indicator = 'SP.URB.TOTL.IN.ZS',
country = countries, start = 1960,
end = 2016).reset_index()
urban.head().pivot() method in pandasurban_wide = urban.pivot(index = 'year', columns = 'country',
values = 'SP.URB.TOTL.IN.ZS')
urban_wide.head()pd.melt() function in pandasurban_long = pd.melt(urban_wide.reset_index(), id_vars = 'year',
var_name = 'country', value_name = 'pcturban')
urban_long.head()urban_long['year'] = urban_long['year'].astype(int)
sns.lineplot(x = "year", y = "pcturban",
hue = "country", data = urban_long)